On the creation of a pronunciation dictionary for Hungarian

نویسنده

  • Stephen M. Grimes
چکیده

This report describes the process of creating a pronunciation dictionary and phonological lexicon for Hungarian for the purpose of aiding in linguistic research on Hungarian phonology and phonotactics. The pronunciation dictionary was created by transforming orthographic forms to pronunciation representations by taking advantage of systematic deviations between Hungarian orthography and pronunciation. It is argued that the “automated” creation of such a dictionary is reasonably expected to be accurate due to the relative similarity of Hungarian orthography to actual pronunciation. This document includes discussion of goals and standards for creating a Hungarian pronunciation dictionary, and each phonological change creating a mismatch between orthography and pronunciation is highlighted. Future developments and additions to the current dictionary are also suggested as well as strategies for evaluating the quality of the dictionary. Finally, potential applications to linguistic research are discussed.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

On the creation of a pronunciation dictionary for Hungarian

Recent research on the phonological structure of the mental lexicon has almost exclusively been based on the English mental lexicon. Linguists and psychologists have been especially interested in identifying what constitutes a phonological neighborhood and how a phonological neighborhood is influenced by word frequency (cf. String edit distance is typically used as a measure of phonological sim...

متن کامل

Dictionary of Abstract and Concrete Words of the Russian Language: A Methodology for Creation and Application

The paper describes the first stage of a project on creating an electronic dictionary with numerical estimates of the degree of abstractness and concreteness of Russian words. Our approach is to integrate data obtained from several different sources: text corpora, psycholinguistic experiments, published dictionaries, markers of abstractness (certain suffixes) and a translation of a similar dict...

متن کامل

The Pronouncing Dictionary of Austrian German and the other Major Varieties of German - A Phonetic Resources Database on the Pronunciation of German

The paper gives a comprehensive overview on the project “Varieties of Austrian German Standard pronunciation and varieties of standard pronunciation” whose primary goal is the creation of a pronouncing dictionary of Austrian German and the creation of a large data base of audio samples for research on spoken language and different forms of pronunciation in Austria. The contents of the dictionar...

متن کامل

Effort and Accuracy during Language Resource Generation: A Pronunciation Prediction Case Study

When developing a language resource, there is generally a trade-off between the amount of effort invested in the resource creation process and the quality of the resulting resource. We argue that, in the developing world with its many resource-scarce languages, a ‘usable’ resource in multiple languages may be more valuable than a highly accurate resource for one language only. From this perspec...

متن کامل

Sentiment Analysis of Social Networking Data Using Categorized Dictionary

Sentiment analysis is the process of analyzing a person’s perception or belief about a particular subject matter. However, finding correct opinion or interest from multi-facet sentiment data is a tedious task. In this paper, a method to improve the sentiment accuracy by utilizing the concept of categorized dictionary for sentiment classification and analysis is proposed.  A categorized dictiona...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007